-
Notifications
You must be signed in to change notification settings - Fork 76
Merge OpenAI Triton commit 152ef2d
#2583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This PR added `fast_expf` operator under libdevice for AMD hardwares. Aligning with other operators in the exp family, the way to deal with denorm inputs is controled by `__HIP_FTZ`, which currently is fixed to be True. - If `__HIP_FTZ = 1`, the operator uses `llvm.amdgcn.exp2.f32`, which will flush denorms in inputs and outputs; - If `__HIP_FTZ = 0`, the operator uses `llvm.exp2.f32`, which will not flush denorms. Ref: https://github.com/ROCm/llvm-project/blob/amd-staging/amd/device-libs/cuda2gcn/src/precision.cl Fixes ROCm/triton-internal#314
…lds of `Autotuner` (#4921) Motivation: https://github.com/triton-lang/triton/pull/4496/files#r1801756225 Signed-off-by: Anatoly Myachev <[email protected]>
… use it for vectorized atomics (#4982) Vectorized atomics on NVIDIA (triton-lang/triton#4971) are only available on Hopper (>=sm90) and PTX >= 8.1. It's possible to be running with PTX 8.0 on a Hopper machine. This PR passes ptx-version to the ttgir->llir conversion pass for NVIDIA, and uses the ptx version to determine whether vectorized atomics should be used.
`add_optimize_dot_operands` may introduce a immutable shared buffer for transposed dot operands. Our stream-pipeliner then replaces the immutable buffer with a mutable buffer to be able to reuse it across iterations (pre-fetching). This will then produce incorrect transOps because the input is mutable but the result is immutable. This PR rewrites those transOps to output a mutable layout.
…CES` is set (#4986) Based on the feedback from AMD, the device mapping problem has to be addressed by the ROCm team, so we emit an error for now.
This PR is only introducing a ttgir pass to convert `tt.load`/`tt.store` to `amdgpu.buffer_load`/`amdgpu.buffer_load`, _when this is possible_ : this means we need to check for 3 conditions: 1. The pointer arithmetic has been canonicalized (`scalarPtr->splat->addptr->load/store`) 2. The offsets are 32-bits 3. The offsets are non-negative. We use a mix of analysis and assumptions to verify this condition Right now the functionality is gated behind an `AMDGCN_USE_BUFFER_OPS`, which now also covers the pointer canonicalization pass which is mostly meant to handle this.
…#4983) This PR: - Introduces fallback from normal TTG->LLVM converter in case it does not support given local_load. - Enables conversion of MFMA dot layout to Linear Layout in local_load pattern.
etiotto
approved these changes
Oct 28, 2024
e1f9267 to
e6df65e
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR change the Triton base from 13594bb to 152ef2d (Oct 24).
Pass rate: 98.98%
Please do not squash and merge this PR.